Use of contexts in language model interpolation and adaptation

نویسندگان

  • Xunying Liu
  • Mark J. F. Gales
  • Philip C. Woodland
چکیده

Language models (LMs) are often constructed by building multiple individual component models that are combined using context independent interpolation weights. By tuning these weights, using either perplexity or discriminative approaches, it is possible to adapt LMs to a particular task. This paper investigates the use of context dependent weighting in both interpolation and test-time adaptation of language models. Depending on the previous word contexts, a discrete history weighting function is used to adjust the contribution from each component model. As this dramatically increases the number of parameters to estimate, robust weight estimation schemes are required. Several approaches are described in this paper. The first approach is based on MAP estimation where interpolation weights of lower order contexts are used as smoothing priors. The second approach uses training data to ensure robust estimation of LM interpolation weights. This can also serve as a smoothing prior for MAP adaptation. A normalized perplexity metric is proposed to handle the bias of the standard perplexity criterion to corpus size. A range of schemes to combine weight information obtained from training data and test data hypotheses are also proposed to improve robustness during context dependent LM adaptation. In addition, a minimum Bayes’ risk (MBR) based discriminative training scheme is also proposed. An efficient weighted finite state transducer (WFST) decoding algorithm for context dependent interpolation is also presented. The proposed technique was evaluated using a state-of-the-art Mandarin Chinese broadcast speech transcription task. Email addresses: [email protected] (Xunying Liu), [email protected] (Mark Gales), [email protected] (Phil Woodland), Tel: +44 1223 766512 Fax: +44 1223 332662 Preprint submitted to Computer Speech and Language June 6, 2012 Character error rate (CER) reductions up to 7.3% relative were obtained as well as consistent perplexity improvements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Problems Associated with the Use of Communicative Language Teaching in EFL Contexts and Possible Solutions

If the target of foreign language teaching is to use the language, communicative language teaching (CLT) seems to be an ideal teaching model. The goal of teaching with this method is to use the language as a medium of communication (Adi, 2012).The application of the communicative approach in teaching English as a foreign language, however, is associated with some problems that can cause the met...

متن کامل

Problems Associated with the Use of Communicative Language Teaching in EFL Contexts and Possible Solutions

If the target of foreign language teaching is to use the language, communicative language teaching (CLT) seems to be an ideal teaching model. The goal of teaching with this method is to use the language as a medium of communication (Adi, 2012).The application of the communicative approach in teaching English as a foreign language, however, is associated with some problems that can cause the met...

متن کامل

Comparing the Impact of Audio-Visual Input Enhancement on Collocation Learning in Traditional and Mobile Learning Contexts

: This study investigated the impact of audio-visual input enhancement teaching techniques on improving English as Foreign Language (EFL) learnersˈ collocation learning as well as their accuracy concerning collocation use in narrative writing. In addition, it compared the impact and efficiency of audio-visual input enhancement in two learning contexts, namely traditional and mo...

متن کامل

Context dependent language model adaptation

Language models (LMs) are often constructed by building multiple component LMs that are combined using interpolation weights. By tuning these interpolation weights, using either perplexity or discriminative approaches, it is possible to adapt LMs to a particular task. In this work, improved LM adaptation is achieved by introducing context dependent interpolation weights. An important part of th...

متن کامل

N-gram Adaptation with Dynamic Interpolation Coefficient Using Information Retrieval Technique

This study presents an N-gram adaptation technique when additional text data for the adaptation do not exist. We use a language modeling approach to the information retrieval (IR) technique to collect the appropriate adaptation corpus from baseline text data. We propose to use a dynamic interpolation coefficient to merge the N-gram, where the interpolation coefficient is estimated from the word...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Speech & Language

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2009